Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

نویسندگان

Shikhar Sharma

Layla El Asri

Hannes Schulz

Jeremie Zumer

چکیده

Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

Combining Task and Dialogue Streams in Unsupervised Dialogue Act Models

Unsupervised machine learning approaches hold great promise for recognizing dialogue acts, but the performance of these models tends to be much lower than the accuracies reached by supervised models. However, some dialogues, such as task-oriented dialogues with parallel task streams, hold rich information that has not yet been leveraged within unsupervised dialogue act models. This paper invest...

متن کامل

Modelling Denial of Expectation in Dialogue: Issues in Interpretation and Generation

We aim to model the semantics of “but” in dialogue, focusing on cases in which it signals denial of expectation (DofE) across speakers. We present an algorithm that predicts the defeated expectation from the perspective of the hearer of the DofE, and we consider differences between task-oriented dialogue (TOD) and non-task-oriented dialogue (NTOD). We motivate this work by showing how it update...

متن کامل

Frames: a corpus for adding memory to goal-oriented dialogue systems

This paper presents the Frames dataset1, a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for thi...

متن کامل

To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language Generation

Natural language generation for task-oriented dialogue systems aims to effectively realize system dialogue actions. All natural language generators (NLGs) must realize grammatical, natural and appropriate output, but in addition, generators for taskoriented dialogue must faithfully perform a specific dialogue act that conveys specific semantic information, as dictated by the dialogue policy of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1706.09799 شماره

صفحات -

تاریخ انتشار 2017

Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

نویسندگان

چکیده

منابع مشابه

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Combining Task and Dialogue Streams in Unsupervised Dialogue Act Models

Modelling Denial of Expectation in Dialogue: Issues in Interpretation and Generation

Frames: a corpus for adding memory to goal-oriented dialogue systems

To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language Generation

عنوان ژورنال:

اشتراک گذاری